Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on Reliability
نویسندگان
چکیده
In this paper, we examine the statistical soundness of comparative assessments within the eld of recommender systems in terms of reliability and human uncertainty. From a controlled experiment, we get the insight that users provide dierent ratings on same items when repeatedly asked. is volatility of user ratings justies the assumption of using probability densities instead of single rating scores. As a consequence, the well-known accuracy metrics (e.g. MAE, MSE, RMSE) yield a density themselves that emerges from convolution of all rating densities. When two dierent systems produce dierent RMSE distributions with signicant intersection, then there exists a probability of error for each possible ranking. As an application, we examine possible ranking errors of the Netix Prize. We are able to show that all top rankings are more or less subject to high probabilities of error and that some rankings may be deemed to be caused by mere chance rather than system quality.
منابع مشابه
Toward More Diverse Recommendations: Item Re-ranking Methods for Recommender Systems
Recommender systems are becoming increasingly important to individual users and businesses for providing personalized recommendations. However, while the majority of algorithms proposed in recommender systems literature have focused on improving recommendation accuracy (as exemplified by the recent Netflix Prize competition), other important aspects of recommendation quality, such as the divers...
متن کاملBennett Netflix 100 Winchester Circle
INTRODUCTION The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [2]. The KDD Cup itself in 2007 con...
متن کاملThe Ethics for Evauating with an Emphasis on the Quranic Teachings
Evaluating performances, particularly that of researchers, has been considered as one of the most important problems in the fields of education and research. In many cases, even one score in evaluating a work would lead to getting or missing a prize unjustly. There can be found some Quranic teachings in the field to solve the problem. Paying attention to personal rights of those being criticize...
متن کاملThe Netflix Prize
In October, 2006 Netflix released a dataset containing 100 million anonymous movie ratings and challenged the data mining, machine learning and computer science communities to develop systems that could beat the accuracy of its recommendation system, Cinematch. We briefly describe the challenge itself, review related work and efforts, and summarize visible progress to date. Other potential uses...
متن کاملMatrix factorization for the Netflix Prize
I compare two common techniques to compute matrix factorizations for recommender systems, specifically using the Netflix prize data set. Accuracy, run-time, and scalability are discussed for stochastic gradient descent and non-linear conjugate gradient.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.08866 شماره
صفحات -
تاریخ انتشار 2017